The graphic is made of lines, with x representing time and y an indicator of how much the average consumer feels about the future of the economy (100 is neutral, high is positive low is negative). There is a line for each of 9 chosen countries, and these are distinguished with facets that assign a color to one each.
The stated story is that consumer confidence has fallen across the world in 2022, which it shows well.
I think it’s a good graphic because it gives a satisfying amount of detail while still allowing comparisons between countries. The trend for each country is easy to see, and while comparing one country to another is difficult the gray lines make comparing one country to the rest of the “world” easy.
I can absolutely make lines and facets so the core of it I can do. I haven’t done layering like this before but that seems simple based on the class slides. I don’t know exactly how to make the non trendline layers like the 100 line, point, and label on the right. It shouldn’t bee to complicated though once layers are made.
As mentioned I’m not certain how to make the middle line and the text, but I think the slides give enough detail.
Exercise 2
library(vegabrite)library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
vl_chart(width =600, height =400, title ="Monthly Tempuratures in Seattle") |>vl_add_data(weather) |>vl_mark_bar() |>vl_encode_x("month:O") |>vl_encode_y("temp_max:Q") |>vl_aggregate_y("mean") |>vl_encode_y2("temp_min:Q") |>vl_aggregate_y2("mean") |>vl_axis_x(title ="Month") |>vl_axis_y(title ="Average High and Low Tempurature")
Warning: Invalid schema for object passed to or created by
modify_inner_spec.vegaspec_unit
2.3
vl_chart(width =300, height =200, title ="Monthly Tempuratures in Seattle") |>vl_add_data(weather) |>vl_mark_line() |>vl_facet("weather", columns =2) |>vl_encode_x("month:O") |>vl_encode_y("weather:Q") |>vl_aggregate_y("count") |>vl_axis_x(title ="Month") |>vl_axis_y(title ="# Of Days Over 4 Years")
# The Y axis is bad, not sure how to make it what I want without data wrangling# I think I want a % of days in that month that are that weather# The shape should be mostly the same though
Something very interesting to me that Tufte brings up is in showing time series there are graphics I normally wouldn’t think about as representing data. Showing how an animal moves in a series of pictures laid out to show time, or most interesting to me the graphic of the Rhone bridge collapse, which can’t even assign time to the x axis and instead overlays a before and after. I don’t know how much this will come up given I expect my data to be numeric, but it’s a reminder to be creative.
Playfair’s chart on wheat prices compared to wages over time (page 34) I chose to look at this graphic because the idea of using 3-4 different marks on one timescale is very interesting. Using a fade-out to not overlap the bars and line is clever and not something I’d consider on a computer. X is time in year, Y is shillings used for both price and weekly wage, color is used just to distinguish parts. It’s 3 layers, each of which should be simple. However the gradient to add space seems impossible in Vega-Lite, maybe a hackey solution could be made to emulate it but I doubt think anything that smooth is possible. The point is to give compare how prices and wages have changed over time, adding the current monarch at the top would have given extra historical context to help the reader.
Exercise 2
The big lesson in this section seemed to be pay attention to the guides, axes should be labeled, consistent, and intuitive. Try to limit to what is relevant.
Always be asking what someone who doesn’t know the data or story would see first.
Exercise 3
Part 1. - Having 2 different Y axes both in % is confusing - The response rate numbers go over the bars and are hard to read - The main point feels hard to understand, I think the point is that responses went up? but I assume completion rate is completion given they responded, so when it goes down it’s hard to know if overall there were more or less completed responses (especially with the mismatched scales)
Part 2. - The Y axis can be made into one line that starts at 0 - The marks can be arranged differently so text isn’t as necessary - Make response rate the focus, and put completion rate under response to better show their relationship (I’m assuming that’s how they work)
Bar charts are commonly used for representing amounts of things, especially categories of things. The bars should be ordered so they’re easily comparable, but don’t mess with the normal order if that’s more intuitive. Try horizontal bars to save space when the bars’ names are too long to sit side-by-side.
b
You might not use bars when: - You want the axis to start somewhere other than 0 - There are too many categories, making grouped bars cluttered
c
Dot plots can show the same things as bars, but without as much space. Heatmaps can show 2 categorical variables in a more readable way than grouped bars.
d
If the total of the groups is useful, then stacked bars might be worthwhile. Using facets or offsets is better for comparing each group against each other.
Figure 6.12 is ‘bad’ because it wastes space showing 0 when it doesn’t matter, making the difference the graph wants to show smaller and harder to read.
Figure 6.13 is ‘bad’ because without any order it’s hard to tell how we’re supposed to compare points. It takes a lot of effort to get a big picture of life expectancy, instead the reader hasto individually find single countries to compare.
Exercise 2
b
The goal is about human perception/interpretation, not just a literal representation of data, double check your graphics.
Comparing things next to each other is way easier than over a distance, especially if there are a lot of other confusing details.
You need to think about the audience, your audience may lack context or awareness of how a graph works.
c
at 37:00 the background is on Tufte page 31 41:13, the Napoleon’s army in Russia, page 41
Exercise 3
a
There is a square point mark at every combination of x and y in the frame, then color is the indicator of some quantitative field.
I didn’t find out how to sort by a specific value in time, may update later. Same with the NA holes but those might be easier to do in R
c
Color differences are hard to see when not right next to each other
d
There is enough of a trend to make a clear gradient, the story doesn’t need specific detail to the point that the exact value matters. The location of the dark bars are the story rather than their exact color. I don’t think it’s a dealbreaker for the figure in part b, but it is a problem. It’s hard to get a reference for any of the exact values, just which country uses more in general in a given year. It makes it less detailed than would be nice but the big picture is mostly intact.
Exercise 4
a
Comparing the position of points/bars is way easier than the angle/distance of an arc. Bars are usually better.
b
He argues that pie charts have at least some use, like for quickly showing which party is in majority since judging if a c=slice is >50% is pretty simple. Bars are better for relative sizes but might not show proportion as easily.
c
He compares them to stacked bars and side-by-side bars. Pies are best for quickly showing very simple fractions like 1/2, otherwise stacked bars can also show proportion and side-by-side can also show small datasets. For relative proportions and many small parts side-by-side bars are best, and stacked bars are great for time series since they’re basically one dimensional.